preserve251a05c410ac51a8

Introduction

This document describes the data created for APRA’s fundraising data science online learning courses and workshops. All of the data created for these purposes is fictitious.

There are three data sets available as of 2020-08-12:

  • Biographical (donor level)
  • Giving (gift level)
  • Engagement (donor level)

Each of these tables and the variables contained within each are described below. There are tabs included throughout the document that can be used to explore the variables included in each data set.

These data sets are designed to mirror realistic fundraising data and are not intended to be perfectly “clean” data. There are common fundraising data challenges built into the data files. For example, you can click on the Biographical Data tab above to learn more about that data set.

All of the code for this project is available on GitHub. The code that generates the data sets can be found in the generate_data.R r script.

The individual datasets can be read into R directly from github as follows.

# load the tidyverse library
library(tidyverse)
library(knitr)

# read bio data csv into R and store in a data frame named bio
bio <- read_csv("https://raw.githubusercontent.com/majerus/apra_data_science_courses/master/bio_data_table.csv")

bio %>% 
  sample_n(10) %>% 
  select(id, name, birthday, city, state, capacity, capacity_source) %>%
  kable()
id name birthday city state capacity capacity_source
5399183 al-Sabet, Haniyya 1966-12-15 Birmingham AL $75k - $100k screening
4999665 Oakley, Kevin 1959-12-20 Fox island WA $75k - $100k screening
7977859 Clayton, Marisa 1957-12-22 Chicago IL $10k - $25k screening
2845485 Minjarez, Brandon 1956-06-30 Detroit MI $50k - $75K screening
3103314 Vaz, Surafale 1937-11-13 Sun city west AZ $5k - $10k screening
3720048 Nguyen, Matthew 1980-06-05 Gainesville GA $25k - $50k NA
2168995 el-Ishak, Saleet 1969-09-02 Steubenville OH $100k - $250k screening
6188537 Burchfield, Vincent 1992-04-04 Iselin NJ NA institutional
5247695 el-Mansouri, Maazin 1966-10-11 Hope AR NA screening
6950846 Lai, Kayla NA Flint MI $50k - $75K institutional

Biographical Data

The biographical data has 14 variables and 100,000 observations. The data is stored at the donor level. Each row of the data represents a unique donor and biographical information about that donor.

Numeric Variables

There are 4 numeric variables:

  • id: A seven digit numeric id that is unique to each donor.
  • household_id: A seven digit numeric id that is unique to households. More than one donor may share a household_id.
  • lat: The latitude of the center point of each donor’s zipcode. Missing for donor’s residing outside the United States.
  • lon: The longitude of the center point of each donor’s zipcode. Missing for donor’s residing outside the United States.
## Rows: 100,000
## Columns: 4
## $ id           <dbl> 9671621, 6098249, 7804098, 2065649, 2290208, 2566581, 88…
## $ household_id <dbl> 1000202, 1000504, 1000843, 1000843, 1000856, 1000856, 10…
## $ lat          <dbl> 45.49, 38.03, 42.23, 38.60, 29.43, 36.14, 33.98, 42.11, …
## $ lon          <dbl> -122.72, -78.48, -91.19, -89.68, -95.24, -115.27, -118.0…

Character Variables

When loaded by default there are 9 character variables:

  • name: Each donor’s first and last name formatted as “last name, first name”.
  • country: Each donor’s country of residence.
  • city: Each donor’s city of residence.
  • deceased: A binary indicator that indicates if a donor is deceased (“Y”|“N”)
  • zip: The five digit zipcode of donor’s whose country of residence is the United States.
  • state: The two-letter state abbreviation for each donor whose country of residence is the United States.
  • capacity: Each donor’s capacity represented within an estimated range.
  • capacity_source: A categorical variable indicating how the capacity was determined (“institutional”|“screening”).
  • race: a categorical variable indicating the donor’s race.
## Rows: 100,000
## Columns: 9
## $ name            <chr> "Tran, Ntxuam", "Probeck, William", "Buchanan-Sam, Ki…
## $ country         <chr> "United States", "United States", "United States", "U…
## $ city            <chr> "Portland", "Charlottesville", "Monticello", "Trenton…
## $ deceased        <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
## $ zip             <chr> "97221", "22903", "52310", "62293", "77511", "89117",…
## $ state           <chr> "OR", "VA", "IA", "IL", "TX", "NV", "CA", "IL", "NM",…
## $ capacity        <chr> "$75k - $100k", "$25k - $50k", "$100k - $250k", "$25k…
## $ capacity_source <chr> "screening", "institutional", "screening", "screening…
## $ race            <chr> "Native Americans or Alska Natives", "Asian", "Non-Hi…

country

deceased

state

capacity

capacity_source

race

Date Variables

There is 1 date variable:

  • birthday: The date of each donor’s birth stored as a date variable.
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1924-11-18, 1922-05-11, NA, 1925-05-11, 1922-01-23, 1923-05…

preservebd212e2124ad3bb4

Giving Data

The giving data has 6 variables and 378,001 observations. The data is stored at the gift level. Each row of the data represents a unique gift and attributes associated with that gift.

Numeric Variables

There are 4 numeric variables:

  • id: A seven digit numeric id that is unique to each donor, but can repeat in the giving data for donors with more than one gift.
  • household_id: A seven digit numeric id that is unique to households.
  • gift_id: A seven digit numeric id that is unique to each gift.
  • gift_amt: The total gift amount received (i.e., total about of cash received in USD).
## Rows: 378,001
## Columns: 4
## $ household_id <dbl> 9420483, 6312023, 6312023, 2669409, 2669409, 5199241, 51…
## $ id           <dbl> 8713532, 4279585, 8942151, 6180247, 8906224, 1131709, 41…
## $ gift_id      <dbl> 2912360, 2912405, 2912405, 2912436, 2912436, 2912487, 29…
## $ gift_amt     <dbl> 405, 1516, 721, 224, 457, 1217, 492, 286, 1962, 6653, 12…

preservee0d86b598e3507a7

Character Variables

When loaded by default there is 1 character variable:

  • credit_type: A categorical variable that indicates if a gift is counted as hard-credit or soft-credit.
## Rows: 100,000
## Columns: 9
## $ name            <chr> "Tran, Ntxuam", "Probeck, William", "Buchanan-Sam, Ki…
## $ country         <chr> "United States", "United States", "United States", "U…
## $ city            <chr> "Portland", "Charlottesville", "Monticello", "Trenton…
## $ deceased        <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
## $ zip             <chr> "97221", "22903", "52310", "62293", "77511", "89117",…
## $ state           <chr> "OR", "VA", "IA", "IL", "TX", "NV", "CA", "IL", "NM",…
## $ capacity        <chr> "$75k - $100k", "$25k - $50k", "$100k - $250k", "$25k…
## $ capacity_source <chr> "screening", "institutional", "screening", "screening…
## $ race            <chr> "Native Americans or Alska Natives", "Asian", "Non-Hi…

preservedb1395a2b670dd6bpreserve3afff80e098d7424

Date Variables

There is 1 date variable:

  • gift_date: The date that each gift was received.
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1924-11-18, 1922-05-11, NA, 1925-05-11, 1922-01-23, 1923-05…

preservea0fa4e91a6fb2ada

Engagement Data

The engagement data has 8 variables and 100,000 observations. The data is stored at the donor level. Each row of the data represents a unique donor and attributes associated with that donor.

Numeric Variables

There are 4 numeric variables:

  • id: A seven digit numeric id that is unique to each donor.
  • number_of_contacts: Number of direct contacts that the donor has had with an advancement representive in the last five years.
  • volunteer: Binary variable indicating if a donor has volunteered in the last five years (0 = No, 1 = Yes).
  • time_on_site: The number of minutes the donor has spent on the organization’s website.
## Rows: 100,000
## Columns: 4
## $ id                <dbl> 9671621, 6098249, 3543434, 7372006, 3899439, 691544…
## $ numer_of_contacts <dbl> 0, 0, NA, 0, NA, 0, NA, NA, 0, 0, NA, 0, 2, NA, 0, …
## $ volunteer         <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, NA,…
## $ time_on_site      <dbl> 65, NA, 362, NA, NA, NA, NA, NA, NA, 910, NA, 175, …

numer_of_contacts

volunteer

time_on_site

Character Variables

There are 3 character variable:

  • gift_officer: The gift officer currently assigned to the donor.
  • event: Has the donor attended an event in the last year.
  • interests: set of the donor’s known interests.
## Rows: 100,000
## Columns: 3
## $ gift_officer <chr> NA, NA, NA, NA, NA, NA, "Banks, Kevin", NA, NA, NA, NA, …
## $ event        <chr> "N", "N", "Y", "N", "N", "N", "N", "Y", NA, "N", "N", "Y…
## $ interests    <chr> "fashion,hunting/fishing,skiing,sports", "cars,boating/s…

gift_officer

event

interests

Date Variables

There is 1 date variable:

  • gift_date: The date that each gift was received.
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1924-11-18, 1922-05-11, NA, 1925-05-11, 1922-01-23, 1923-05…

preservebe4285c985daffb0